Goto

Collaborating Authors

 transformer supplemental material


Transformer in Transformer Supplemental Material

Neural Information Processing Systems

We can see that for both DeiT -S and TNT -S, more patches are related as layer goes deeper. MLP to calculate the attention values. The attention is multiplied to all the embeddings. We extract the features from different layers of TNT to construct multi-scale features. The COCO2017 val results are shown in Table 2. TNT achieves much better Table 2: Results of Faster RCNN object detection on COCO minival set with ImageNet pre-training.